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Abstract — Modeling and understanding BitTorrent 
(BT) dynamics is a recurrent research topic mainly 
due to its high complexity and tremendous practical 
efficiency. Over the years, different models have un- 
covered various phenomena exhibited by the system, 
many of which have direct impact on its perfor- 
mance. In this paper we identify and characterize a 
phenomenon that has not been previously observed: 
homogeneous peers (with respect to their upload 
capacities) experience heterogeneous download rates. 
The consequences of this phenomenon have direct 
impact on peer and system performance, such as 
high variability of download times, unfairness with 
respect to peer arrival order, bursty departures 
and content synchronization. Detailed packet-level 
simulations and prototype-based experiments on the 
Internet were performed to characterize this phe- 
nomenon. We also develop a mathematical model 
that accurately predicts the heterogeneous download 
rates of the homogeneous peers as a function of their 
content. Although this phenomenon is more prevalent 
in unpopular swarms (very few peers), these by far 
represent the most common type of swarm in BT. 

I. Introduction 

Peer-to-peer (P2P) applications have widely 
been used for content recovery in Internet. Among 
them, BitTorrent (BT) [1] is one of the most 
popular, used by millions daily to retrieve millions 
of files (movies, TV series, music, etc), accounting 
for large fractions of today's Internet traffic [2|. 
The mainstream success of BT is closely related 
to its performance (e.g., fast download times) and 
together with its high complexity, has triggered the 
interest of researchers. 

Understanding and characterizing the perfor- 
mance of BT through mathematical models has 
been an active topic of research [J|. Several studies 
have uncovered peculiar aspects BT's dynamic, 
many of which have direct impact on system 
performance. Moreover, models that capture user 
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and system performance under homogeneous and 
heterogeneous peer population (with respect to 
their upload capacities) have been proposed for 
various scenarios iHJ-jTl. However, most proposed 
models target large-scale systems, either with a 
large and fixed initial peer population or relatively 
high arrival rates. 

We consider a BT swarm where all peers have 
identical upload capacities but unconstrained (or 
large) download capacities. In this context, we 
identify and characterize a phenomenon that has 
not been previously observed: homogeneous peers 
experience heterogeneous download rates. This is 
surprising because peers are identical and should 
thus exhibit similar average performance and be- 
cause it has not been captured by any prior model 
(to the best of our knowledge). Moreover, this ob- 
servation has several important implications, such 
as high variability of download times, unfairness 
with respect to peer arrival order, bursty departures 
and content synchronization among the peers. Two 
peers are said to be content synchronized after their 
content become identical at a given instant. This 
last consequence is particularly critical since it is 
closely related to the missing piece syndrome |I8]. 

We characterize the fact that homogeneous peers 
experience heterogeneous download rates and its 
various consequences by using detailed packet- 
level simulations and prototype-based experiments 
on the Internet. To underpin critical parameters for 
this behavior, we consider various scenarios. We 
also develop a mathematical model that explains 
the phenomenon and predicts the heterogeneous 
download rates of the homogeneous peers as a 
function of their content. The comparison of model 
predictions with simulation results indicate the 
model is quite accurate. More importantly, the 
model sheds light on the key insight for this 
behavior: upload capacity allocation of peers in BT 
depends fundamentally on piece interest relation- 



ship, which for unpopular swarms can be rather 
asymmetric. 

Finally, the phenomenon we identify is more 
prevalent in swarms that have a very small peer 
population and usually a single seed (peer with 
entire content) with limited bandwidth. However, 
this is by far the most prevalent kind of swarm in 
BT [9J. Measurement studied indicates that more 
than 35% of the swarms have less than 5 peers at 
any point in time. Thus, we focus our attention on 
unpopular swarms. 

The rest of this paper is organized as follows. 
In lin] we present a brief overview of BT and moti- 
vate the phenomenon we have identified. In ijllllwe 
characterize the phenomenon and its consequences 
using simulations and experiments with a real BT 
application. i jlVI presents our mathematical model 
and its validation. In iJV] we apply the model 
to make predictions about bursty departures. We 
extend our discussion and present some related 
work in WII and ijVIII respectively. Finally, we 
conclude the paper in Willi 

II. BT OVERVIEW AND THE OBSERVED 
BEHAVIOR 

In this section we briefly describe the BT proto- 
col and identify an unexpected behavior common 
in unpopular swarms. 

A. Brief BT overview 

BT is a swarm based file sharing P2P applica- 
tion. Swarm is a set of users (peers) interested in 
downloading and/or sharing the same content (a 
single or a bundle of files). The content is chopped 
into pieces (chunks) which are exchanged among 
peers connected to the swarm. The entities in a 
swarm may be of three different types: (i) the 
Seeds which are peers that have a complete copy 
of the content and are still connected to the system 
altruistically uploading data to other peers; (ii) the 
Leechers which are peers that have not yet fully 
recovered the content and are actively downloading 
and simultaneously uploading the chunks; and, (iii) 
the Tracker which is a kind of swarm coordinator, 
it keeps track of the leechers and seeds connected 
to the swarm. 

Periodically, the Tracker distributes lists with a 
random subset of peers connected to the swarm 
to promote the interaction among participating 
peers. In a first interaction, two peers exchange 
their bitmaps (a list of all file chunks they have 
downloaded). Any latter update in their bitmaps 
must be reported by the leecher 



In order to receive new chunks, the leecher 
must send "Interested" messages to all peers that 
announced to have the wanted pieces in their 
respective bitmaps. Because of the "rarest first" 
approach specified in BT protocol, leechers prior- 
itize to download first the chunks that are scarcer 
in the swarm. Once a sub-piece of any chunk is 
received, the "strict priority" policy defines that 
the remaining sub-pieces from that particular chunk 
must be requested before starting the download of 
any other chunk. 

Whenever an "Interested" messages is received, 
peers have to decide whether to "unchoke" that 
leecher and serve the piece or to "choke" the peer 
and ignore the request. Leechers preferentially up- 
load content to other leechers that reciprocate like- 
wise, it is based on a "tit-for-tat" incentive strategy 
defined by BT's protocol. However, a minor frac- 
tion of its bandwidth must be dedicated to altruis- 
tically serve leechers that have never reciprocated. 
This policy, referred to as "optimistic unchoke", 
is useful for leechers to boost new reciprocity 
relationships. As the seeds do not reciprocate, they 
adopt the "optimistic unchoke" approach all the 
time. Those BT policies were designed with the 
main purpose of giving all leechers a "fair share" of 
bandwidth. It means that peers uploading in higher 
rates will receive in higher download rate, and in a 
population of leechers uploading at the same rate, 
they all must reach equal download rates. 

B. The observed behavior 

Having presented BT's mechanisms, we now 
illustrate the heterogeneous download rate phe- 
nomenon and its consequences with two simple 
examples. Consider a swarm formed by a seed and 
5 leechers. All peers, including the single seed, 
have identical upload capacity (64 kBps), but large 
(unconstrained) download capacity. The leechers 
download a file containing 1000 pieces (256MB) 
and exit the swarm immediately after download 
completion. The seed never leaves the swarm. 
This system was evaluated using a detailed packet- 
level simulator of BT and also an instrumented 
implementation of BT running on PlanetLab tlOl . 

Figures [Ta] and [lb] show the evolution of the 
swarm size as a function of time for both simu- 
lation and experimental results and two different 
leecher arrival patterns. In Figure [Ta] peers leave 
the swarm in the order they arrived (i.e., FIFO) 
and have a relatively similar download time. Thus, 
the download time is relatively indifferent to arrival 
order (with the exception of the first peer). 
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Fig. 2. Evolution of the number of downloaded pieces. 



Figure [Tb] shows the same metric just for dif- 
ferent arrival times (in fact, the inter-arrival times 
of peers are also mostly preserved). Surprisingly, 
a unexpected behavior can be observed in the 
system dynamic: despite the significant difference 
on arrival times, all five leechers completed their 
respective download nearly at the same time. The 
time inter departures is small comparing to the 
download time, which characterizes bursty depar- 
tures. It means that peers that arrive later to the 
swarm have a smaller download time. In fact, the 
fifth peer completed the download in about half 
the time of the first leecher Thus, the system 
is quite unfair with respect to the arrival order 
of leechers, with late arrivals being significantly 
favored. What is happening? Why does BT exhibit 
such dynamics? We answer these questions in the 
next sections. 

III. Heterogeneity in homogeneous BT 

SWARMS 

In order to understand the unexpected behav- 
ior exhibited by BT in Figure [lb] we will ana- 
lyze the total number of pieces each leecher has 
downloaded over time. Consider Figures |2a] and 
l2b] where each curve indicates the total number 
of pieces downloaded by a given peer for the 
corresponding scenario in Figures [Ta] and [Tb] re- 
spectively. One can note that the slope of each 
curve corresponds to respective leecher's download 
rate. 

We start by considering Figure [2a] Despite the 
slope of the first leecher being smaller than that 
of the remaining peers, the curves never meet. In 
particular, a leecher finishes the download (and 
leaves the swarm) before the next leecher reaches 
its number of blocks. We also note that all other 
leechers have very similar slopes. In addition, we 
observe a peculiar behavior: the slope of the fifth 
leecher suddenly decreases when it becomes the 
single leecher in the system. 



The results illustrated in Figure [2b] which cor- 
respond to the scenario considered in Figure [lb] 
show a very different behavior Several interesting 
observations can be drawn from this figure. The 
slope of the first peer is practically constant, re- 
maining unchanged by the arrival of other peers. 
The slope of all other peers is larger than that of the 
first peer, meaning the curves may eventually meet. 
When two curves meet, the corresponding leechers 
have the same number of blocks and possibly the 
same content (we will comment on this point in 
the following section). The figure also shows that 
a younger peer does not overcome the first peer, 
but instead the two maintain the same number of 
downloaded pieces after the joining point, possibly 
with their contents synchronized. Finally, the slope 
of the second, third and fourth peer are rather 
similar However, the slope of the fifth peer is 
slightly larger than the others, meaning a higher 
download rate and consequently smaller download 
time. 

In summary, we make the following general 
observations: 

> The first leecher downloads approximately at 
constant rate. 

> Subsequent leechers download at a faster rate 
than the first. 

> Once a leecher reaches the total number of 
pieces downloaded by the first leecher, their 
download rates are identical. 

> Once a leecher reaches the total number of 
pieces downloaded by the first leecher, the 
download rates of other leechers increase. 

All these observations are related to the dynamics 
of BT and will be discussed and explained in 
Section [IV] using a simple mathematical model. 
In the remainder of this section, we discuss the 
consequences of the observed phenomenon and 
illustrate that it happens even when peer arrival is 
random (i.e., Poisson process). 



A. Consequences of heterogeneity in homogeneous 
swarms 

The observations above imply essentially that the 
download time of peers are quite different, despite 
their homogeneous upload capacity. In summary, 
the consequences are: 

• Variability in download times. Since peers 
can experience a consistently different down- 
load rate, their download times can also differ 

. Unfairness with respect to peer arrival 
order. Since peers download rates, and thus 
download times, may depend on their arrival 
order, the system is inherently unfair, poten- 
tially benefiting latecomers in a swarm. 

• Content synchronization. Due to different 
download rates and BT's piece selection 
mechanisms (most notably rarest-first), leech- 
ers can synchronize on the number of pieces 
they have and, more strongly, on the content 
itself. This means that peers may end up 
with exactly the same content at some instant, 
despite arriving at different points of time. 

• Bursty departures. A direct consequence of 
content synchronization is bursty departures. 
This means that peers tend to leave the swarm 
within a small interval despite arriving at the 
swarm at relatively far apart instants. 

Although figures do not show the content syn- 
chronization explicitly, since the first leecher is 
downloading the file at the same rate at which the 
seed push new pieces into the swarm, whenever a 
leecher reaches the same number of pieces than it, 
they have exactly the same content. 

Of course, the prevalence of the phenomenon 
and its consequences depend directly on the pa- 
rameters of the swarm. In particular, the arrival 
times of peers is certainly the most determinant. 
However, parameters like upload capacity of seed 
and leechers and number of pieces are also fun- 
damentally important. Intuitively, a file with a 
larger number of pieces or a seed with a lower 
upload capacity increase the probability that the 
consequences above occur In fact, for any arrival 
order of a small set of peers, one can always find 
system parameters for which this behavior and its 
consequences occur. 

B. Heterogeneity under Poisson arrivals 

The behavior above does not require determin- 
istic arrivals or any crafted leecher arrival pattern. 
It arises even when arrival patterns are random. 
In this section we characterize the consequences 
of the heterogeneous download rates phenomenon 
under Poisson arrivals. 
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Fig. 3. Average download time as a function of arrival order 
in a busy period. 



We conducted a large amount of evaluations us- 
ing detailed packet-level simulations. In particular, 
we consider a BT swarm where a single seed is 
present at all times, while leechers arrive according 
to a Poisson process and depart the swarm as soon 
as their download is completed. In the evaluation 
that follows, all leechers have the same upload 
capacity of 64 kBps (and very large download 
capacities) and download a file with 1000 pieces. 
The upload capacity of the seed (cs) varies between 
48 kBps, 64 kBps, and 96 kBps, and the leecher 
arrival rate (A) is 1/1000. These scenarios generate 
a swarm that has a time average size of 3.7, 3.4 
and 3.0 leechers, respectively. 

We start by characterizing the variability in the 
download times and the unfairness with respect 
to leecher arrival order Figure |3] illustrates the 
average download time for leechers as a function 
of their arrival order in a busy period. Thus, the 
i-\h arrival of a busy period is mapped to index i. 
The different curves correspond to different upload 
capacities of the seed. The results clearly indicate 
that the download time depends on leecher arrival 
order In particular, for the case Cg — 64 kBps, 
the average download time tends to decrease with 
increasing arrival order, and so the first arrival has 
the largest average download time. Moreover, the 
download time differences are also significant, and 
can reach up to 30% (e.g., difference between first 
and fourth arrival). 

Figure [3] also indicates that variability in down- 
load times strongly depends on the seed upload 
capacity. In particular, a fast seed yields the reverse 
effect: leechers' download times tend to increase 
with arrival order Intuitively, when a slow seed is 
present, late arrivals to a busy period obtain large 
download rates from other leechers, thus exhibiting 
a lower download time. However, when a fast seed 
is present, the first leecher has the larger upload 



1 

0.8 
0.6 
0.4 
0.2 



1 

-11 Cs = 


*8, c, = 
34, C| = 
96, C| = 


k i/x = 

34, 1A = 
34, 1/X = 


1000 
1000 
1000 


+ 
□ 
o 






















i 






^ +n 


+ 



1000 2000 3000 4000 5000 
departure interval (s) 



Cg=64, C|=64 average 
synchronized 



I I 



y i i 



500 1000 1500 2000 2500 
mean interarrival time (s) 



Fig. 4. Empirical CCDF of the inter-peer departure time 
conditioned on a busy period. 



Fig. 5. Mean number of leechers and mean number 
synclironized leechers. 
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capacity of the seed until the second arrival, thus 
exhibiting a lower download time. The results also 
illustrate second order effects. For instance, a very 
late arrival can have an average download time 
slightly larger (or smaller) than a late arrival (e.g., 
the sixth leecher arrival has longer download time 
than fourth for Cg — 64 kBps). Intuitively, this 
occurs because a very late arrival is likely to be 
alone in the busy period, having to resort to the 
seed for finishing the download. Since the upload 
capacity of the seed can be smaller (larger) than 
the aggregate download rate it receives from other 
leechers, its download time can increase (decrease). 
This behavior and its consequences will be ex- 
plained and captured by the mathematical model 
presented in the next section. 

In what follows we characterize the burstiness in 
the leecher departure process. Figure shows the 
empirical CCDF (Complementary Cumulative Dis- 
tribution Function) of the leecher inter-departure 
times conditioned on a busy period (i.e., not in- 
cluding the inter-departure time between the last 
leecher in a busy period and the first leecher of the 
next). Note that the peer inter- arrival times follow 
an exponential distribution with rate 1/1000. How- 
ever, the results indicate a very distinct departure 
process. In particular, many peers tend to leave the 
swarm at roughly the same time: up to 30% of 
peers leave the swarm within a couple of seconds 
from each other (when Cg — 64 kBps). Moreover, 
the departure process also exhibits high variability 
and some peers take as much as ten times more to 
leave the system after a departure than the average 
(when Cs = 64 kBps). The figure also clearly 
shows that this observation strongly depends on 
the seed upload capacity, and is more pronounced 
when the seed is slow. Intuitively, a slower seed in- 
creases the average download time, thus increasing 
the chances that leechers synchronize their content 
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Fig. 6. Box-plot of download time of leechers for different 
average inter-arrival times. 



during the download and depart almost at the same 
time. Finally, we also note that a fast seed yields a 
much less bursty departure process, although still 
favoring shorter inter-departure times. 

One consequence of the heterogeneous down- 
load rates that is closely related to the bursty 
departures is content synchronization. Figure |5] 
illustrates the intensity of such synchronization for 
different arrival rates. It shows the average number 
of leechers in the system and the average number 
of those which are synchronized. Here we refer 
to as synchronized leechers that are not interested 
in more than 50 pieces (5% of the file) of any 
other We observe that, the number of synchro- 
nized leechers remains practically the same as we 
increase the inter-peer arrival time, indicating that 
a larger fraction of peers have similar content when 
popularity decreases. 

We next consider the influence of the leecher 
arrival rate on the download times, independently 
of arrival order Figure |6] shows a box plot of 
the download times of peers as a function of the 
average inter-peer arrival time (i.e., the inverse of 
arrival rate), for Cs — 64 kBps. For each scenario, 
the box plot curve indicates the minimum, 25-th 



percentile, average, 75-th percentile and maximum 
download times. Note that when the inter-arrival 
time is large (2000 or 2500), the 75-th percentile is 
very close to the maximum download time, indicat- 
ing that many peers have similar download times. 
As the average inter-arrival time decreases, this 
concentration near the maximum diminishes signif- 
icantly. However, the variability between minimum 
and maximum download time does not diminish 
with the inter-arrival time. In addition, we run 
simulations for different values of Cg and observed 
that a faster seed also has strong influence on 
this behavior, exhibiting a much less concentrated 
download times (ommited for conciseness). 

C. Real experimental evaluation 

The results shown above were all obtained 
through simulations but we now present results 
from prototype-based experiments deployed in 
more realistic scenarios. The real experiments were 
performed in the Internet using machines from 
Planetlab fTll and running an instrumented version 
of a BT client |10|. Although a large number of 
experiments were conducted, we report only on a 
limited set of these results due to space constraints. 
The goal here is to validate the phenomenon of 
heterogeneity in homogeneous BT swarms and its 
consequences in real BT application running over 
the Internet. 

We consider only private swarms in the ex- 
periment, in the sense that only peers controlled 
by the experiment can connect to the swarm for 
uploading and downloading content. Each private 
swarm consists of a single file of size S MB which 
is owned by a single seed that is always available 
and has upload capacity of Cg. Leechers interested 
in downloading the content arrive to the swarm 
according to a Poisson process with rate A. All 
leechers that arrive to the swarm are homogeneous 
and have upload capacity equal to c/. Each exper- 
iment run is executed for t = 5, 000 seconds. 

We start by analyzing the evolution of the swarm 
size for an unpopular swarm. Figure |7] shows the 
number of leechers in the swarm over time for 
the duration of the experiment, with parameters 
A = 1/125 peers/sec, S' = 20 MB, and = c/ = 
50 kBps. We can observe several occurrences of 
bursty departures, even if leechers arrive according 
to a Poisson process. As previously discussed, 
bursty departures are consequence of content syn- 
chronization among the leechers in the swarm. 

Using the same experiment as above, we in- 
vestigate the impact of the leechers' arrival order 
on their download times. Figure |8] illustrates the 
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Fig. 7. Swarm size from real experiment. 
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Fig. 8. Dynamic of tlie swarm: leecliers' arrivals and depar- 
tures. 

dynamics of the swarm, where each horizontal line 
corresponds to the lifetime of a leecher in the 
swarm, starting when the peer arrives and ending 
when it departs the swarm. Note that peers ex- 
hibit significantly different download time (which 
corresponds to their lifetime in the system). In 
particular, in many cases leechers arrive at different 
time instants but depart in the same burst. For 
instance, the fifth leecher to arrive to the swarm 
departs in a burst together with all four prior 
arrivals. Thus, the fifth leecher has a much smaller 
download completion time, when compared to the 
first leecher Similar behavior occurs between the 
fifteenth leecher and the three leechers that ar- 
rived immediately before. Besides illustrating the 
variability of the download times, this observation 
also indicates the unfairness with respect to leecher 
arrival order In particular, late arrivals to a busy 
period tend to have smaller download times. 

We now focus on the distribution of the leechers' 
download times to illustrate their relative high 
variability. Figures |9a] and |9b] show the comple- 
mentary cumulative distribution function (CCDF) 
of download times computed for two experiments 
with distinct upload capacities for the seed (c^ — 
50 kBps and Cs = 60 kBps, respectively, with 
all other parameters the same). In both results, 
download times exhibit a high variance, as shown 
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Fig. 9. CCDF of download time from real experiments. 



in the figures. In tlie case Cg = 50 kBps (Figure |9ali. 
tlie minimum and maximum values are 145 and 
480 seconds, respectively, with the maximum being 
more than three times the minimum. When the 
upload capacity of the seed is higher than that of 
the leechers. Figure |9b] shows that the variance 
in download times decreases, as expected, since 
the system capacity is increased. Finally, we note 
several discontinuities (i.e., sharp drops) in both 
CCDF curves which are caused by sets of leechers 
that have approximately the same download time. 

IV. Model 

We develop a simple model attaining to under- 
stand the origin of the heterogeneous download 
times and its consequences. Our model obtains an 
approximation to the average upload and download 
rates observed by each leecher on different time 
intervals. 

Consider a homogeneous swarm of an unpopular 
content with a single seed to which leechers arrive 
sequentially and depart as soon as they complete 
their download, such as the one illustrated in 
Figure [Ta] In this scenario, bursty departures can 
only happen if younger leechers obtain roughly 
the same number of pieces as older ones, and 
leave the swarm at about the same instant. This 
in turn implies that younger leechers must have 
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higher download rates than older ones, at least for 
some periods of time. Why is that? At a given 
moment, an older leecher i may have all pieces 
owned by a younger leecher j. Thus, leecher's j 
uplink capacity will be used by other leechers until 
j receives a piece that i does not have. During 
this period of time, j simply cannot serve i, even 
if it has no other leecher to serve. Therefore, the 
sets of pieces owned by each leecher are the root 
causes for heterogeneous download rates and must 
be considered. 

In order to capture the observation above, each 
peer, either a seed or a leecher, is represented by 
a queueing system with multiple queues (see Fig- 
ure llOab . one for each neighbor, under a processor 
sharing discipline. Queue i of peer j contains the 
pieces interesting to peer i (i.e., all pieces that j 
has that i has not). When peer i downloads one 
of these pieces, either from j or some other peer, 
this piece is removed from this queue, as well as 
all other queues where the piece was present. On 
the other hand, whenever a peer downloads a piece 
that other neighbors are interested in, this piece 
will be placed in the queues corresponding to those 
neighbors, increasing their queues sizes. Finally, 
the queues of the seed always have all pieces that 
are needed by the leechers. As a leecher downloads 
pieces from the seed and other leechers, this queue 
decreases, eventually becoming empty when the 
leecher downloads the entire content and departs 
the swarm. We note that the order at which these 
pieces are served from these queues depend on the 
piece selection policy, but is not important for our 
discussion. 

Let Cs and c; be the seed and leechers' uplink 
capacities, respectively. Assume that the leechers' 
downlink capacities are much larger than or c;. 
Let N{t) be the number of leechers in the system 
at time t. Since the seed always has interesting 



pieces to every leecher, all the N{t) queues in 
the seed are backlogged. Thus, all queues will be 
served at rate Cs/N{t). Note that, since the swarm 
is unpopular, we assume the swarm size is small 
enough such that every leecher is neighbor of every 
other leecher, including the seed. 

A leecher may not have interesting pieces to 
some of its neighbors at time t. Let a leecher 
be identified by its arrival order, thus leecher i 
is the i-th leecher to join the swarm. Also let 
ni{t) < N{t) — 1 be the number of leechers 
interested in pieces owned by i. The instantaneous 
upload rate from i to any of these leechers is 
ci/ni{t). 

Whether a leecher has or has not pieces interest- 
ing to another depends on the leechers' respective 
bitmaps, i.e. the current subsets of pieces owned 
by a leecher The set of bitmaps of all leechers 
would precisely determine the exact pieces in each 
queue. However, the dynamics of the bitmaps are 
intricated and to keep track of them would be 
unnecessarily complicated for modeling the phe- 
nomenom we are interested in. Instead, we consider 
the number of pieces owned by each leecher i, 
bi{t),yi and infer whether a leecher has interesting 
pieces to other leechers. 

For the sake of simplicity, let bi{t) — bi, 
N{t) = N and ni{t) — n,\/i. Two remarks can be 
made with respect to bi and the interest relationship 
among leechers: 

Remark 1. If bi > bj, then i has at least bi — bj 
interesting pieces to j. 

Remark 2. If < bi < bj, it is impossible to 
determine whether i has or has not interesting 
pieces to j without further information. 

In the following, we will use these two remarks 
to derive a simple model to capture the upload and 
download rates between the peers. With respect to 
Remark|2l we will assume no further information is 
available, and hence the piece interest relationship 
among peers will be ignored in this case. 

A. A simple fluid model 

We assume that the content is a fluid, or equiv- 
alently, its pieces can be subdivided in infinitely 
many parts that can be exchanged (uploaded and 
downloaded) continuously. 

To simplify the explanation, assume that bi > 
&2 > ---bN, i-C- an older leecher has strictly 
more pieces than a younger one. We assume that 
if leecher i has joined the swarm after j, i.e. 
i > j, i can still upload pieces to j as long 



as i downloads pieces from any peer k that has 
more pieces than j, i.e. k < j. We also assume 
that every piece downloaded from the seed by 
a leecher is immediately interesting to all other 
leechers, independent of their age. This assumption 
is justified due to the rarest first piece selection 
policy used in BT. 

Since the seed's upload capacity is Cg, each 
leecher downloads from it at rate Cg/N. Now let 
gij be the rate at which peer i could potentially 
upload data to peer j provided that there is no ca- 
pacity constraints (i.e. independently of upload and 
download capacities of peers i and j, respectively). 
If a leecher i is older than j, i has interesting 
pieces to j. Therefore, from the perspective of the 
multiple queueing system, queue j in leecher i is 
backlogged and gij = oo. On the other hand, if 
i is younger than j, the rate gij is given by the 
rate at which i downloads interesting pieces to j. 
According to the previous assumptions, this rate is 
equal to the rate at which peers older than j upload 
to peer i. Adding this to the rate at which peer i 
downloads from the seed, we thus have: 

g.ij ^ Cs/N + ^Ukt,i > j. (1) 

k<j 

where Uki is the rate at which leecher k uploads 
to i. 

We now make an important observation con- 
cerning Equation ([T]i. Consider leecher i and some 
other leecher j. The older j is with respect to i 
the smaller is the rate at which i can upload to 
j, that is, the smaller is gij. If j is younger than 
i, then gij — oo. This observation implies that 

gii < 912 < ■ ■ ■ < giN- 

Since the upload capacity of peers is finite, we 
must now determine how the capacity of a given 
peer i will be divided to serve each of the leechers. 
In particular, recall that Uij is the upload rate from 
peer i to peer j and note that Uiu < ci, where 
c; is the upload capacity of a leecher To determine 
Uij we will use gij and a bandwidth allocation 
mechanism that follows a progressive filling al- 
gorithm, as is illustrated in Figure llObI Roughly, 
infinitesimal amounts of bandwidth are allocated to 
each leecher until no available bandwidth remains 
or one or more leechers are satisfied with respect to 
the gijVj constraints. In the latter case, it continues 
to distribute the capacity among the non-satisfied 
leechers. The final bandwidth allocation for leecher 
i can be obtained by computing the following 
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Fig. 11. Example of matrix U = (wij) showing the right 
order of calculation. 



equation in the order j ~ 1, . . . , N. 



Ui-i — mm 



" n - \{k\k <j.k^ i}\ 



(2) 



where \A\ is the cardinaHty of a set A. Re- 
call from Equation ([T]l that gij depends on 
U2.i, . . . , Uj-i.i, for i > j. By calculating 
Uij in the order i = 1,...,N, we assure that 
every variable in Equation (|2]l has been previously 
computed. 

As an example, consider the calculation of the 
matrix U = {uij), which determines upload rates 
between peers at a given moment, for a small 
swarm containing a single seed and = 3 
leechers. Let their upload capacities be equal to 
Cs = 60 kBps and ci = 96 kBps, respectively, 
and assume 6i > 62 > ^3- Matrix U and the 
order of computation of its elements are depicted in 
Figure [TT] The download rate for peer i is simply 
the sum of the elements in column i. 

Equation ^ corroborates the idea that homoge- 
neous peers can exhibit heterogeneous upload rates 
which depend on the number of pieces owned by 
the leechers. Moreover, the younger leechers tend 
to have a higher download rate, as they obtain a 
higher upload rate from other leechers. 

Eventually the number of pieces owned by a 
leecher may reach the number of pieces owned 
by an older one. In particular, this is bound to 
occur since younger leechers tend to have a higher 
download rate. In this case, these two leechers will 
no longer have pieces interesting to each other 
Thus, Equations ([T]i and dU must be rewritten as 
functions of bi,\/i: 



gij = Cs/N + 
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Intuitively, Equation ^ combines the two con- 
straints on the rate at which i upload pieces to 
j. The first term stands for the maximum instan- 
taneous rate irrespective of capacity limitations. 



The second term reflects the fraction of i's uplink 
capacity that can be dedicated to j given that some 
bandwidth has already been allocated. In this case, 
- J2k\bk>bj "jfe is the remaining capacity of i 
and n — \{k\bk > bj,k ^ i}\ is the number of 
peers that will share it (including j). 

B. Model Validation 

Our model gives an approximation to the average 
download rate experienced by a leecher in a swarm 
which depends on the relationship between the 
number of pieces owned by the peers. In this 
section, we validate the model comparing its pre- 
dictions with simulations results. 

We consider a homogeneous swarm containing 
N leechers with Cg = ci. In this scenario, it is rea- 
sonable to assume that bi > b2 > ■ ■ ■ > b^ if the 
index reflects the peer arrival order We partition 
the set of leechers in two subsets: leechers with 
the same number of pieces as the oldest leecher 
(subset A), and those with less pieces than the 
oldest one (subset B). In the scenario considered, 
the model predicts that all leechers in a subset will 
have identical download rates. Moreover, a leecher 
in B will have a higher download rate than one in 
A and this difference depends on the set sizes. In 
the following, we compare the average download 
rate of peers in each of these sets with simulation 
results. 

We use deterministic arrivals to reproduce the 
exact scenarios we intend to compare. For a swarm 
with N leechers such that ua of these belong to 
partition A (i.e. have bi pieces) the arrivals are set 
as follows: the first ua arrivals occur next to each 
other, after they have roughly the same number 
of pieces, i.e., \bi — bi\ < 3, the other N — ua 
leechers to join the swarm sequentially and far 
apart. We then compute the average download rate 
experienced by a leecher in subset A and for a 
leecher in B, over a large time interval but before 
any departures. 

We have simulated 5 runs for each scenario. The 
confidence intervals obtained are relatively small 
and will be omitted. The results for 1 < < 5 
and 1 < < -/V are presented in Figures fTSl a.b). 
Figure I12al shows simulation and model results 
for leechers in A. The average download rate of 
a leecher in A predicted by the model for this 
scenario does not depend on N or ua and is 
represented by the horizontal line. Note that model 
is quite accurate, despite the various configurations 
for N and ua- In particular, the relative error is less 
than 1% for all scenarios. 
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Fig. 12. Simulation results for 



- ci = 0.25. 



Figure I12bl shows the average download rate for 
leechers in B. Since there are numerous points 
showing either simulation or model results, we use 
'+' to identify simulations and 'x' to identify model 
results (except for N — 2, where a circle and a 
square are used respectively). In addition, to ease 
the work of comparing these points, there are lines 
connecting results of the same type (simulation 
or model) for same value of N. We note that 
the model is quite accurate, with differences being 
unnoticeable in many scenarios and less than 10% 
in all cases. More importantly, the model captures 
well the behavior observed in simulation. For a 
fixed N, as the number of leechers in A increases, 
the average download rate of leechers in B grows. 
On the other hand, for a fixed n^, the average 
download rate decreases with N. Finally, a larger 
number of leechers in the swarm implies a larger 
range of possible download rates for leechers in B, 
since ua can vary from 1 to N. 

V. Predicting bursty departures 

The model presented in Section HV] can be used 
to estimate the number of departures that occur in a 
burst. In particular, consider the arrival of a leecher 
that initiates a busy period (i.e., the first arrival after 
the swarm had no leechers). In the following, we 
estimate the average number of peers that depart 
the swarm in a burst together with the leecher that 
initiated the busy period. 

Let / denote the first leecher of a busy period 
and assume that the leecher arrival follows a Pois- 
son distribution with rate A. Also, as assumed by 
the model, a seed is always present and has uplink 
capacity of Cg . Finally, let S denote the number of 
pieces of the content. 

According to the model, the first leecher, /, will 
download the entire content at a fixed rate equal 
to Cs, independently on the number of peers in the 
swarm. Note that is also the upper bound on 



the average download rate, since the seed cannot 
push new pieces into the network at a faster rate. 
Thus, / will take T — S/cg seconds to finish the 
download. 

Consider arrivals that occur while peer / is in 
the swarm. The number of such arrivals, say N, 
is a random variable and follows the Poisson dis- 
tribution with parameters A and T. The download 
rates of these leechers are a function of N and also 
their instant of arrival. Moreover, as discussed in 
Section IIV-BI larger values of imply a larger 
spread in the download rates (see Figure I12bt . 
To obtain a conservative lower and upper bound 
on these download rates, we will consider a suf- 
ficiently large value for N. In particular, we use 
the 99-th percentile of N, namely Agg, and thus, 

P[N < Ngg] < 0.99. 

Given that exactly A'gg leechers will join the 
swarm before the departure of /, we can use the 
model to obtain the minimum and maximum down- 
load rates of these peers, independent of their inter- 
arrival timing. Let dmin and d„iax be, respectively, 
the minimum and the maximum download rates 
obtained from the model given that the swarm has 
A'gg + 1 leechers. Thus, the minimum and maxi- 
mum time for the leechers to obtain the content is, 
respectively, S/dmax and S/drmn- 

Therefore, at least all leechers that arrive before 
T — S/dmin will leave the swarm together in a 
burst with /. The expected number of peers that 
will arrive within this time period, i?mm is simply 
given by 

Bmin = A (t — j (5) 

Similarly, at most all leechers that arrive before 
T — Sj dmax will leave the swarm in a burst with 
/. The expected number of peers that will arrive 
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Bounds for the expected number of leechers that 
depart in a burst with/, for a = 1/1000. 
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within this time period, Bmax is simply given by 

Bmax =x(t- (6) 

Finally, Bmin and Bmax provide a lower and upper 
bound for the average number of leechers that will 
depart the swarm in a burst with /. 

Table H] shows the expected number of arrivals to 
the swarm before / departs, E[N], which is simply 
AT, and both the lower and upper bounds Bmin 
and Bmax, respectively. The table shows numerical 
results for different Cs values but with c/ = 64 
kB/s and A = 1/1000. The results indicate that 
average number of peers that depart the swarm in 
a burst with / can be significant: between 32% and 
82% of all arrivals when the seed is slower than 
the leechers and between 10% and 47% when they 
have the same upload capacity. We also observe 
that these ratios reduce as Cg increases, indicating 
that bursty departures are less likely to occur with 
fast seeds. 

VI. General discussions 

It is interesting to consider the prevalence of the 
observed phenomenon in more general scenarios. 
Although we have shown its prevalence under a 
crafted peer arrival process and under Poisson 
arrivals, we claim that homogeneous peers can have 
heterogeneous download rates under very general 
arrival patterns. In particular, given any arrival 
pattern of peers into a swarm, it is possible to 
choose system parameters (i.e., seed upload capac- 
ity, leechers upload capacity, and file size) such 
that the effects described in this paper will be very 
prevalent. Intuitively, by choosing a fast enough 
seed, peers will not be able to disseminate old 
pieces before new ones are pushed into the swarm, 
and thus will have significantly different number of 
blocks, while by choosing a large enough file peers 
are bound to synchronize before they finish the 
download. In a sense, the behavior observed and 
described in this paper is quite general, although 
the requirement of the swarm being unpopular is 
important, as we next describe. 
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Fig. 13. Number of leechers of a popular' swarm over time 
(A = 1/12, Cs = 50 kB/s, ci = 50 kB/s) 

What happens if we consider very popular 
swarms, where the peer arrival rate is very large, 
yielding very large swarm sizes? Figure [TJI shows 
experimental results of the dynamics of leecher 
arrivals and departures for this scenario (Poisson 
arrivals with rate A = 1/12 and uplink capacities 
of Cs ~ 50 kB/s and c/ = 50 kB/s). Interestingly, 
we can observe several of the consequences of 
having heterogeneous download rates. In particular, 
we can observe bursty departures, content synchro- 
nization and high variability of download times 
(peers that leave in a large burst have different 
download times, as arrival is well-behaved), for 
example, at times 600s and 1200s. In a sense, 
the phenomenon is quite prevalent even during the 
busy period, but not strong enough to end the 
busy period. The characterization and modeling of 
the phenomenon in this scenario is much more 
entailed, given the complicated dynamics of piece 
exchange of BT and consequently the interest rela- 
tionship among peers. We leave the investigation of 
these scenarios (popular swarms) as future work. 

Last, we now comment on the relationship of 
our findings and the missing piece syndrome fSj. 
The key aspect of this syndrome is content syn- 
chronization, where a large fraction of peers have 
all but one and the same piece. This situation is 
particularly bad to the performance of the swarm, 
as the departure rate of the swarm will be equal 
to the seed upload capacity (assuming peers depart 
as soon as they acquire the last block). Our work 
has shown that peers can synchronize their content 
much before the last piece. In some sense, this gen- 
eralizes the syndrome to a piece synchronization 
syndrome, which is inherent to BT dynamics, due 
to the heterogeneous download rates. Once peers 
have synchronized their content, they can only 
acquire new pieces from the seed, at the upload 
capacity of the seed. In this situation, the missing 
piece syndrome is bound to occur 

VII. Related prior works 

Modeling P2P file sharing systems and in par- 
ticular BT has been an active area of research in 



the past few years, driven mainly by the high com- 
plexity, robustness and user-level performance of 
such systems. One of the first BT models to predict 
the download times of peers was presented in |5 |. 
This simple fluid model based on differential equa- 
tions assumes homogeneous peer population (with 
respect to download and upload capacities) and 
Poisson arrivals, but yields analytical steady state 
solution. Several subsequent BT models have been 
proposed in the literature to capture various system 
characteristics, among them heterogeneous peer 
population (with respect to upload and download 
capacities) ||6l, Q, [121 . BT performance was also 
studied in the context of corporate and academic 
LANs where access links are often symmetric llTSI . 
However, to the best of our knowledge, all models 
predict that identical peers (with respect to their 
upload capacities) simultaneously downloading a 
file will have identical performance (with respect 
to download rates), contrary to the findings in 
this paper Moreover, BT models generally assume 
either a rather large peer arrival rate (e.g., Poisson) 
or a large flash crowd (all peers join the swarm at 
the same time). This is somewhat surprising, given 
that most real BT swarms are rather small in size 
and quite unpopular |9|. Finally, one perverse effect 
of this lack of popularity, content unavailability, is 
shown to be a severe problem found in most of BT 
swarms lfT4l . 

Another interesting aspect of BT has been the 
discovery and characterization of some non-trivial 
phenomena induced by its complex dynamics. For 
example, peers in BT swarm tend to form clusters 
based on their upload link capacities, exhibiting a 
strong homophily effect. In particular, peers with 
identical upload capacities tend to exchange rel- 
atively more data between them ifTsl . lfT6l . An- 
other interesting observed behavior is the fact that 
arriving leechers can continue to download the 
entire content despite the presence of any seed in 
the swarm, a property known as self-sustainability 
ifTTl . More recently, a phenomenon known as 
missing piece syndrome has been identified and 
characterized mathematically, which states that in 
large swarms of long durations, the system can 
become unstable (i.e., number of leechers diverges 
to infinity) if the upload capacity of the seed is not 
large enough fS). This last phenomenon is quite 
related to our work and was discussed in Section 
IVII Again, to the best of our knowledge, we are 
not aware of any prior work that has alluded the 
phenomenon we describe in this paper, namely, 
that homogeneous peers can have heterogeneous 
download rates. 



VIII. Conclusion 

This paper identifies, characterizes and models 
an interesting phenomenon in BT: Homogeneous 
peers (with respect to their upload capacity) ex- 
perience heterogeneous download rates. The phe- 
nomenon is more pronounced in unpopular swarms 
(few leechers) and has important consequences 
that directly impact peer and system performance. 
The mathematical model proposed captures well 
these heterogeneous download rates of peers and 
provides fundamental insights into the root cause of 
the phenomenon. Namely, the allocation of system 
capacity (aggregate uplink capacity of all peers) 
among leechers depend on the piece interest rela- 
tionship among peers, which for unpopular swarms 
is directly related to arrival order and can be 
significantly different. 
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